Dataset statistics
| Number of variables | 11 |
|---|---|
| Number of observations | 935 |
| Missing cells | 272 |
| Missing cells (%) | 2.6% |
| Duplicate rows | 0 |
| Duplicate rows (%) | 0.0% |
| Total size in memory | 80.5 KiB |
| Average record size in memory | 88.1 B |
Variable types
| Numeric | 9 |
|---|---|
| Categorical | 2 |
IQ is highly overall correlated with educ and 1 other fields | High correlation |
educ is highly overall correlated with IQ and 1 other fields | High correlation |
meduc is highly overall correlated with feduc | High correlation |
feduc is highly overall correlated with meduc | High correlation |
exper is highly overall correlated with educ and 2 other fields | High correlation |
tenure is highly overall correlated with exper | High correlation |
age is highly overall correlated with exper | High correlation |
black is highly overall correlated with IQ | High correlation |
meduc has 78 (8.3%) missing values | Missing |
feduc has 194 (20.7%) missing values | Missing |
tenure has 30 (3.2%) zeros | Zeros |
Reproduction
| Analysis started | 2023-01-25 22:04:57.845830 |
|---|---|
| Analysis finished | 2023-01-25 22:05:14.720398 |
| Duration | 16.87 seconds |
| Software version | pandas-profiling vv3.5.0 |
| Download configuration | config.json |
wage
Real number (ℝ)
| Distinct | 449 |
|---|---|
| Distinct (%) | 48.0% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 95794.545 |
| Minimum | 11500 |
|---|---|
| Maximum | 307800 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 7.4 KiB |
Quantile statistics
| Minimum | 11500 |
|---|---|
| 5-th percentile | 43790 |
| Q1 | 66900 |
| median | 90500 |
| Q3 | 116000 |
| 95-th percentile | 169550 |
| Maximum | 307800 |
| Range | 296300 |
| Interquartile range (IQR) | 49100 |
Descriptive statistics
| Standard deviation | 40436.082 |
|---|---|
| Coefficient of variation (CV) | 0.42211257 |
| Kurtosis | 2.7175818 |
| Mean | 95794.545 |
| Median Absolute Deviation (MAD) | 24900 |
| Skewness | 1.2011868 |
| Sum | 89567900 |
| Variance | 1.6350767 × 109 |
| Monotonicity | Not monotonic |
Histogram with fixed size bins (bins=50)
| Value | Count | Frequency (%) |
| 100000 | 31 | 3.3% |
| 125000 | 17 | 1.8% |
| 80000 | 15 | 1.6% |
| 50000 | 13 | 1.4% |
| 96200 | 13 | 1.4% |
| 144200 | 12 | 1.3% |
| 60000 | 11 | 1.2% |
| 90000 | 11 | 1.2% |
| 75000 | 11 | 1.2% |
| 120000 | 10 | 1.1% |
| Other values (439) | 791 |
| Value | Count | Frequency (%) |
| 11500 | 1 | |
| 20000 | 1 | |
| 23300 | 1 | |
| 26000 | 1 | |
| 26500 | 1 | |
| 28900 | 1 | |
| 30000 | 1 | |
| 31000 | 1 | |
| 31800 | 1 | |
| 32500 | 2 |
| Value | Count | Frequency (%) |
| 307800 | 2 | |
| 277100 | 1 | 0.1% |
| 266800 | 1 | 0.1% |
| 250000 | 2 | |
| 240400 | 2 | |
| 231000 | 1 | 0.1% |
| 230800 | 1 | 0.1% |
| 216200 | 3 | |
| 213700 | 4 | |
| 209900 | 1 | 0.1% |
hours
Real number (ℝ)
| Distinct | 37 |
|---|---|
| Distinct (%) | 4.0% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 43.929412 |
| Minimum | 20 |
|---|---|
| Maximum | 80 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 7.4 KiB |
Quantile statistics
| Minimum | 20 |
|---|---|
| 5-th percentile | 38 |
| Q1 | 40 |
| median | 40 |
| Q3 | 48 |
| 95-th percentile | 60 |
| Maximum | 80 |
| Range | 60 |
| Interquartile range (IQR) | 8 |
Descriptive statistics
| Standard deviation | 7.2242559 |
|---|---|
| Coefficient of variation (CV) | 0.16445146 |
| Kurtosis | 4.1866405 |
| Mean | 43.929412 |
| Median Absolute Deviation (MAD) | 0 |
| Skewness | 1.5961752 |
| Sum | 41074 |
| Variance | 52.189873 |
| Monotonicity | Not monotonic |
Histogram with fixed size bins (bins=37)
| Value | Count | Frequency (%) |
| 40 | 497 | |
| 45 | 97 | 10.4% |
| 50 | 91 | 9.7% |
| 55 | 41 | 4.4% |
| 48 | 35 | 3.7% |
| 60 | 32 | 3.4% |
| 44 | 19 | 2.0% |
| 38 | 15 | 1.6% |
| 43 | 14 | 1.5% |
| 35 | 11 | 1.2% |
| Other values (27) | 83 | 8.9% |
| Value | Count | Frequency (%) |
| 20 | 1 | 0.1% |
| 23 | 1 | 0.1% |
| 24 | 1 | 0.1% |
| 25 | 1 | 0.1% |
| 27 | 2 | 0.2% |
| 30 | 7 | |
| 32 | 5 | |
| 34 | 1 | 0.1% |
| 35 | 11 | |
| 36 | 4 | 0.4% |
| Value | Count | Frequency (%) |
| 80 | 4 | 0.4% |
| 75 | 2 | 0.2% |
| 70 | 7 | 0.7% |
| 65 | 7 | 0.7% |
| 64 | 1 | 0.1% |
| 61 | 1 | 0.1% |
| 60 | 32 | |
| 59 | 1 | 0.1% |
| 58 | 2 | 0.2% |
| 56 | 3 | 0.3% |
IQ
Real number (ℝ)
| Distinct | 80 |
|---|---|
| Distinct (%) | 8.6% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 101.28235 |
| Minimum | 50 |
|---|---|
| Maximum | 145 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 7.4 KiB |
Quantile statistics
| Minimum | 50 |
|---|---|
| 5-th percentile | 74 |
| Q1 | 92 |
| median | 102 |
| Q3 | 112 |
| 95-th percentile | 124.3 |
| Maximum | 145 |
| Range | 95 |
| Interquartile range (IQR) | 20 |
Descriptive statistics
| Standard deviation | 15.052636 |
|---|---|
| Coefficient of variation (CV) | 0.14862052 |
| Kurtosis | -0.016643599 |
| Mean | 101.28235 |
| Median Absolute Deviation (MAD) | 10 |
| Skewness | -0.34097187 |
| Sum | 94699 |
| Variance | 226.58186 |
| Monotonicity | Not monotonic |
Histogram with fixed size bins (bins=50)
| Value | Count | Frequency (%) |
| 96 | 35 | 3.7% |
| 104 | 35 | 3.7% |
| 109 | 33 | 3.5% |
| 98 | 30 | 3.2% |
| 97 | 28 | 3.0% |
| 110 | 28 | 3.0% |
| 105 | 27 | 2.9% |
| 106 | 26 | 2.8% |
| 101 | 23 | 2.5% |
| 108 | 22 | 2.4% |
| Other values (70) | 648 |
| Value | Count | Frequency (%) |
| 50 | 1 | |
| 54 | 1 | |
| 55 | 1 | |
| 59 | 1 | |
| 60 | 1 | |
| 61 | 1 | |
| 62 | 2 | |
| 63 | 1 | |
| 64 | 2 | |
| 65 | 1 |
| Value | Count | Frequency (%) |
| 145 | 1 | 0.1% |
| 137 | 1 | 0.1% |
| 134 | 4 | |
| 132 | 5 | |
| 131 | 4 | |
| 130 | 3 | |
| 129 | 4 | |
| 128 | 4 | |
| 127 | 6 | |
| 126 | 4 |
educ
Real number (ℝ)
| Distinct | 10 |
|---|---|
| Distinct (%) | 1.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 13.468449 |
| Minimum | 9 |
|---|---|
| Maximum | 18 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 7.4 KiB |
Quantile statistics
| Minimum | 9 |
|---|---|
| 5-th percentile | 11 |
| Q1 | 12 |
| median | 12 |
| Q3 | 16 |
| 95-th percentile | 18 |
| Maximum | 18 |
| Range | 9 |
| Interquartile range (IQR) | 4 |
Descriptive statistics
| Standard deviation | 2.1966539 |
|---|---|
| Coefficient of variation (CV) | 0.16309627 |
| Kurtosis | -0.73486269 |
| Mean | 13.468449 |
| Median Absolute Deviation (MAD) | 1 |
| Skewness | 0.5486765 |
| Sum | 12593 |
| Variance | 4.8252883 |
| Monotonicity | Not monotonic |
Histogram with fixed size bins (bins=10)
| Value | Count | Frequency (%) |
| 12 | 393 | |
| 16 | 150 | 16.0% |
| 13 | 85 | 9.1% |
| 14 | 77 | 8.2% |
| 18 | 57 | 6.1% |
| 15 | 45 | 4.8% |
| 11 | 43 | 4.6% |
| 17 | 40 | 4.3% |
| 10 | 35 | 3.7% |
| 9 | 10 | 1.1% |
| Value | Count | Frequency (%) |
| 9 | 10 | 1.1% |
| 10 | 35 | 3.7% |
| 11 | 43 | 4.6% |
| 12 | 393 | |
| 13 | 85 | 9.1% |
| 14 | 77 | 8.2% |
| 15 | 45 | 4.8% |
| 16 | 150 | 16.0% |
| 17 | 40 | 4.3% |
| 18 | 57 | 6.1% |
| Value | Count | Frequency (%) |
| 18 | 57 | 6.1% |
| 17 | 40 | 4.3% |
| 16 | 150 | 16.0% |
| 15 | 45 | 4.8% |
| 14 | 77 | 8.2% |
| 13 | 85 | 9.1% |
| 12 | 393 | |
| 11 | 43 | 4.6% |
| 10 | 35 | 3.7% |
| 9 | 10 | 1.1% |
exper
Real number (ℝ)
| Distinct | 22 |
|---|---|
| Distinct (%) | 2.4% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 11.563636 |
| Minimum | 1 |
|---|---|
| Maximum | 23 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 7.4 KiB |
Quantile statistics
| Minimum | 1 |
|---|---|
| 5-th percentile | 5 |
| Q1 | 8 |
| median | 11 |
| Q3 | 15 |
| 95-th percentile | 19 |
| Maximum | 23 |
| Range | 22 |
| Interquartile range (IQR) | 7 |
Descriptive statistics
| Standard deviation | 4.3745864 |
|---|---|
| Coefficient of variation (CV) | 0.37830543 |
| Kurtosis | -0.56379545 |
| Mean | 11.563636 |
| Median Absolute Deviation (MAD) | 3 |
| Skewness | 0.077800885 |
| Sum | 10812 |
| Variance | 19.137006 |
| Monotonicity | Not monotonic |
Histogram with fixed size bins (bins=22)
| Value | Count | Frequency (%) |
| 11 | 89 | 9.5% |
| 9 | 82 | 8.8% |
| 8 | 72 | 7.7% |
| 10 | 72 | 7.7% |
| 16 | 68 | 7.3% |
| 12 | 65 | 7.0% |
| 13 | 62 | 6.6% |
| 15 | 60 | 6.4% |
| 7 | 54 | 5.8% |
| 14 | 54 | 5.8% |
| Other values (12) | 257 |
| Value | Count | Frequency (%) |
| 1 | 12 | 1.3% |
| 3 | 1 | 0.1% |
| 4 | 29 | 3.1% |
| 5 | 30 | 3.2% |
| 6 | 48 | |
| 7 | 54 | |
| 8 | 72 | |
| 9 | 82 | |
| 10 | 72 | |
| 11 | 89 |
| Value | Count | Frequency (%) |
| 23 | 2 | 0.2% |
| 22 | 3 | 0.3% |
| 21 | 12 | 1.3% |
| 20 | 14 | 1.5% |
| 19 | 23 | 2.5% |
| 18 | 30 | |
| 17 | 53 | |
| 16 | 68 | |
| 15 | 60 | |
| 14 | 54 |
| Distinct | 23 |
|---|---|
| Distinct (%) | 2.5% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 7.2342246 |
| Minimum | 0 |
|---|---|
| Maximum | 22 |
| Zeros | 30 |
| Zeros (%) | 3.2% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 7.4 KiB |
Quantile statistics
| Minimum | 0 |
|---|---|
| 5-th percentile | 1 |
| Q1 | 3 |
| median | 7 |
| Q3 | 11 |
| 95-th percentile | 16 |
| Maximum | 22 |
| Range | 22 |
| Interquartile range (IQR) | 8 |
Descriptive statistics
| Standard deviation | 5.0752058 |
|---|---|
| Coefficient of variation (CV) | 0.70155491 |
| Kurtosis | -0.79859858 |
| Mean | 7.2342246 |
| Median Absolute Deviation (MAD) | 4 |
| Skewness | 0.4325322 |
| Sum | 6764 |
| Variance | 25.757714 |
| Monotonicity | Not monotonic |
Histogram with fixed size bins (bins=23)
| Value | Count | Frequency (%) |
| 1 | 104 | |
| 2 | 93 | 9.9% |
| 3 | 72 | 7.7% |
| 9 | 71 | 7.6% |
| 5 | 68 | 7.3% |
| 4 | 59 | 6.3% |
| 10 | 58 | 6.2% |
| 7 | 56 | 6.0% |
| 12 | 53 | 5.7% |
| 8 | 48 | 5.1% |
| Other values (13) | 253 |
| Value | Count | Frequency (%) |
| 0 | 30 | 3.2% |
| 1 | 104 | |
| 2 | 93 | |
| 3 | 72 | |
| 4 | 59 | |
| 5 | 68 | |
| 6 | 19 | 2.0% |
| 7 | 56 | |
| 8 | 48 | |
| 9 | 71 |
| Value | Count | Frequency (%) |
| 22 | 1 | 0.1% |
| 21 | 2 | 0.2% |
| 20 | 4 | 0.4% |
| 19 | 6 | 0.6% |
| 18 | 14 | 1.5% |
| 17 | 9 | 1.0% |
| 16 | 22 | |
| 15 | 38 | |
| 14 | 28 | |
| 13 | 40 |
age
Real number (ℝ)
| Distinct | 11 |
|---|---|
| Distinct (%) | 1.2% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 33.080214 |
| Minimum | 28 |
|---|---|
| Maximum | 38 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 7.4 KiB |
Quantile statistics
| Minimum | 28 |
|---|---|
| 5-th percentile | 29 |
| Q1 | 30 |
| median | 33 |
| Q3 | 36 |
| 95-th percentile | 38 |
| Maximum | 38 |
| Range | 10 |
| Interquartile range (IQR) | 6 |
Descriptive statistics
| Standard deviation | 3.1078033 |
|---|---|
| Coefficient of variation (CV) | 0.093947496 |
| Kurtosis | -1.257094 |
| Mean | 33.080214 |
| Median Absolute Deviation (MAD) | 3 |
| Skewness | 0.11873587 |
| Sum | 30930 |
| Variance | 9.6584411 |
| Monotonicity | Not monotonic |
Histogram with fixed size bins (bins=11)
| Value | Count | Frequency (%) |
| 30 | 120 | |
| 32 | 99 | |
| 38 | 99 | |
| 31 | 98 | |
| 36 | 95 | |
| 29 | 86 | |
| 37 | 82 | |
| 33 | 81 | |
| 34 | 69 | |
| 35 | 61 |
| Value | Count | Frequency (%) |
| 28 | 45 | 4.8% |
| 29 | 86 | |
| 30 | 120 | |
| 31 | 98 | |
| 32 | 99 | |
| 33 | 81 | |
| 34 | 69 | |
| 35 | 61 | |
| 36 | 95 | |
| 37 | 82 |
| Value | Count | Frequency (%) |
| 38 | 99 | |
| 37 | 82 | |
| 36 | 95 | |
| 35 | 61 | |
| 34 | 69 | |
| 33 | 81 | |
| 32 | 99 | |
| 31 | 98 | |
| 30 | 120 | |
| 29 | 86 |
married
Categorical
| Distinct | 2 |
|---|---|
| Distinct (%) | 0.2% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 7.4 KiB |
| 1 | |
|---|---|
| 0 |
Length
| Max length | 1 |
|---|---|
| Median length | 1 |
| Mean length | 1 |
| Min length | 1 |
Characters and Unicode
| Total characters | 935 |
|---|---|
| Distinct characters | 2 |
| Distinct categories | 1 ? |
| Distinct scripts | 1 ? |
| Distinct blocks | 1 ? |
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.
Unique
| Unique | 0 ? |
|---|---|
| Unique (%) | 0.0% |
Sample
| 1st row | 1 |
|---|---|
| 2nd row | 1 |
| 3rd row | 1 |
| 4th row | 1 |
| 5th row | 1 |
Common Values
| Value | Count | Frequency (%) |
| 1 | 835 | |
| 0 | 100 | 10.7% |
Length
Histogram of lengths of the category
Common Values (Plot)
| Value | Count | Frequency (%) |
| 1 | 835 | |
| 0 | 100 | 10.7% |
Most occurring characters
| Value | Count | Frequency (%) |
| 1 | 835 | |
| 0 | 100 | 10.7% |
Most occurring categories
| Value | Count | Frequency (%) |
| Decimal Number | 935 |
Most frequent character per category
Decimal Number
| Value | Count | Frequency (%) |
| 1 | 835 | |
| 0 | 100 | 10.7% |
Most occurring scripts
| Value | Count | Frequency (%) |
| Common | 935 |
Most frequent character per script
Common
| Value | Count | Frequency (%) |
| 1 | 835 | |
| 0 | 100 | 10.7% |
Most occurring blocks
| Value | Count | Frequency (%) |
| ASCII | 935 |
Most frequent character per block
ASCII
| Value | Count | Frequency (%) |
| 1 | 835 | |
| 0 | 100 | 10.7% |
black
Categorical
| Distinct | 2 |
|---|---|
| Distinct (%) | 0.2% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 7.4 KiB |
| 0 | |
|---|---|
| 1 |
Length
| Max length | 1 |
|---|---|
| Median length | 1 |
| Mean length | 1 |
| Min length | 1 |
Characters and Unicode
| Total characters | 935 |
|---|---|
| Distinct characters | 2 |
| Distinct categories | 1 ? |
| Distinct scripts | 1 ? |
| Distinct blocks | 1 ? |
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.
Unique
| Unique | 0 ? |
|---|---|
| Unique (%) | 0.0% |
Sample
| 1st row | 0 |
|---|---|
| 2nd row | 0 |
| 3rd row | 0 |
| 4th row | 0 |
| 5th row | 0 |
Common Values
| Value | Count | Frequency (%) |
| 0 | 815 | |
| 1 | 120 | 12.8% |
Length
Histogram of lengths of the category
Common Values (Plot)
| Value | Count | Frequency (%) |
| 0 | 815 | |
| 1 | 120 | 12.8% |
Most occurring characters
| Value | Count | Frequency (%) |
| 0 | 815 | |
| 1 | 120 | 12.8% |
Most occurring categories
| Value | Count | Frequency (%) |
| Decimal Number | 935 |
Most frequent character per category
Decimal Number
| Value | Count | Frequency (%) |
| 0 | 815 | |
| 1 | 120 | 12.8% |
Most occurring scripts
| Value | Count | Frequency (%) |
| Common | 935 |
Most frequent character per script
Common
| Value | Count | Frequency (%) |
| 0 | 815 | |
| 1 | 120 | 12.8% |
Most occurring blocks
| Value | Count | Frequency (%) |
| ASCII | 935 |
Most frequent character per block
ASCII
| Value | Count | Frequency (%) |
| 0 | 815 | |
| 1 | 120 | 12.8% |
| Distinct | 19 |
|---|---|
| Distinct (%) | 2.2% |
| Missing | 78 |
| Missing (%) | 8.3% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 10.682614 |
| Minimum | 0 |
|---|---|
| Maximum | 18 |
| Zeros | 3 |
| Zeros (%) | 0.3% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 7.4 KiB |
Quantile statistics
| Minimum | 0 |
|---|---|
| 5-th percentile | 6 |
| Q1 | 8 |
| median | 12 |
| Q3 | 12 |
| 95-th percentile | 16 |
| Maximum | 18 |
| Range | 18 |
| Interquartile range (IQR) | 4 |
Descriptive statistics
| Standard deviation | 2.8497563 |
|---|---|
| Coefficient of variation (CV) | 0.26676583 |
| Kurtosis | 0.94405474 |
| Mean | 10.682614 |
| Median Absolute Deviation (MAD) | 1 |
| Skewness | -0.4977403 |
| Sum | 9155 |
| Variance | 8.1211109 |
| Monotonicity | Not monotonic |
Histogram with fixed size bins (bins=19)
| Value | Count | Frequency (%) |
| 12 | 357 | |
| 8 | 129 | 13.8% |
| 10 | 65 | 7.0% |
| 11 | 56 | 6.0% |
| 9 | 47 | 5.0% |
| 16 | 42 | 4.5% |
| 7 | 31 | 3.3% |
| 6 | 30 | 3.2% |
| 14 | 28 | 3.0% |
| 13 | 21 | 2.2% |
| Other values (9) | 51 | 5.5% |
| (Missing) | 78 | 8.3% |
| Value | Count | Frequency (%) |
| 0 | 3 | 0.3% |
| 1 | 1 | 0.1% |
| 2 | 5 | 0.5% |
| 3 | 9 | 1.0% |
| 4 | 6 | 0.6% |
| 5 | 8 | 0.9% |
| 6 | 30 | 3.2% |
| 7 | 31 | 3.3% |
| 8 | 129 | |
| 9 | 47 | 5.0% |
| Value | Count | Frequency (%) |
| 18 | 5 | 0.5% |
| 17 | 7 | 0.7% |
| 16 | 42 | 4.5% |
| 15 | 7 | 0.7% |
| 14 | 28 | 3.0% |
| 13 | 21 | 2.2% |
| 12 | 357 | |
| 11 | 56 | 6.0% |
| 10 | 65 | 7.0% |
| 9 | 47 | 5.0% |
| Distinct | 18 |
|---|---|
| Distinct (%) | 2.4% |
| Missing | 194 |
| Missing (%) | 20.7% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 10.217274 |
| Minimum | 0 |
|---|---|
| Maximum | 18 |
| Zeros | 1 |
| Zeros (%) | 0.1% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 7.4 KiB |
Quantile statistics
| Minimum | 0 |
|---|---|
| 5-th percentile | 5 |
| Q1 | 8 |
| median | 10 |
| Q3 | 12 |
| 95-th percentile | 16 |
| Maximum | 18 |
| Range | 18 |
| Interquartile range (IQR) | 4 |
Descriptive statistics
| Standard deviation | 3.3006999 |
|---|---|
| Coefficient of variation (CV) | 0.32305094 |
| Kurtosis | -0.028311983 |
| Mean | 10.217274 |
| Median Absolute Deviation (MAD) | 2 |
| Skewness | -0.043468976 |
| Sum | 7571 |
| Variance | 10.89462 |
| Monotonicity | Not monotonic |
Histogram with fixed size bins (bins=18)
| Value | Count | Frequency (%) |
| 12 | 216 | |
| 8 | 122 | |
| 10 | 77 | 8.2% |
| 6 | 41 | 4.4% |
| 11 | 40 | 4.3% |
| 9 | 39 | 4.2% |
| 16 | 38 | 4.1% |
| 7 | 37 | 4.0% |
| 14 | 28 | 3.0% |
| 5 | 22 | 2.4% |
| Other values (8) | 81 | 8.7% |
| (Missing) | 194 |
| Value | Count | Frequency (%) |
| 0 | 1 | 0.1% |
| 2 | 8 | 0.9% |
| 3 | 9 | 1.0% |
| 4 | 17 | 1.8% |
| 5 | 22 | 2.4% |
| 6 | 41 | 4.4% |
| 7 | 37 | 4.0% |
| 8 | 122 | |
| 9 | 39 | 4.2% |
| 10 | 77 |
| Value | Count | Frequency (%) |
| 18 | 16 | 1.7% |
| 17 | 6 | 0.6% |
| 16 | 38 | 4.1% |
| 15 | 7 | 0.7% |
| 14 | 28 | 3.0% |
| 13 | 17 | 1.8% |
| 12 | 216 | |
| 11 | 40 | 4.3% |
| 10 | 77 | 8.2% |
| 9 | 39 | 4.2% |
Auto
The auto setting is an interpretable pairwise column metric of the following mapping:- Variable_type-Variable_type : Method, Range
- Categorical-Categorical : Cramer's V, [0,1]
- Numerical-Categorical : Cramer's V, [0,1] (using a discretized numerical column)
- Numerical-Numerical : Spearman's ρ, [-1,1]
This configuration uses the recommended metric for each pair of columns.
Spearman's ρ
The Spearman's rank correlation coefficient (ρ) is a measure of monotonic correlation between two variables, and is therefore better in catching nonlinear monotonic correlations than Pearson's r. It's value lies between -1 and +1, -1 indicating total negative monotonic correlation, 0 indicating no monotonic correlation and 1 indicating total positive monotonic correlation.To calculate ρ for two variables X and Y, one divides the covariance of the rank variables of X and Y by the product of their standard deviations.
Pearson's r
The Pearson's correlation coefficient (r) is a measure of linear correlation between two variables. It's value lies between -1 and +1, -1 indicating total negative linear correlation, 0 indicating no linear correlation and 1 indicating total positive linear correlation. Furthermore, r is invariant under separate changes in location and scale of the two variables, implying that for a linear function the angle to the x-axis does not affect r.To calculate r for two variables X and Y, one divides the covariance of X and Y by the product of their standard deviations.
Kendall's τ
Similarly to Spearman's rank correlation coefficient, the Kendall rank correlation coefficient (τ) measures ordinal association between two variables. It's value lies between -1 and +1, -1 indicating total negative correlation, 0 indicating no correlation and 1 indicating total positive correlation.To calculate τ for two variables X and Y, one determines the number of concordant and discordant pairs of observations. τ is given by the number of concordant pairs minus the discordant pairs divided by the total number of pairs.
Cramér's V (φc)
Cramér's V is an association measure for nominal random variables. The coefficient ranges from 0 to 1, with 0 indicating independence and 1 indicating perfect association. The empirical estimators used for Cramér's V have been proved to be biased, even for large samples. We use a bias-corrected measure that has been proposed by Bergsma in 2013 that can be found here.Phik (φk)
Phik (φk) is a new and practical correlation coefficient that works consistently between categorical, ordinal and interval variables, captures non-linear dependency and reverts to the Pearson correlation coefficient in case of a bivariate normal input distribution. There is extensive documentation available here. A simple visualization of nullity by column.
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.
The correlation heatmap measures nullity correlation: how strongly the presence or absence of one variable affects the presence of another.
| wage | hours | IQ | educ | exper | tenure | age | married | black | meduc | feduc | |
|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 76900.0 | 40 | 93 | 12 | 11 | 2 | 31 | 1 | 0 | 8.0 | 8.0 |
| 1 | 80800.0 | 50 | 119 | 18 | 11 | 16 | 37 | 1 | 0 | 14.0 | 14.0 |
| 2 | 82500.0 | 40 | 108 | 14 | 11 | 9 | 33 | 1 | 0 | 14.0 | 14.0 |
| 3 | 65000.0 | 40 | 96 | 12 | 13 | 7 | 32 | 1 | 0 | 12.0 | 12.0 |
| 4 | 56200.0 | 40 | 74 | 11 | 14 | 5 | 34 | 1 | 0 | 6.0 | 11.0 |
| 5 | 140000.0 | 40 | 116 | 16 | 14 | 2 | 35 | 1 | 1 | 8.0 | NaN |
| 6 | 60000.0 | 40 | 91 | 10 | 13 | 0 | 30 | 0 | 0 | 8.0 | 8.0 |
| 7 | 108100.0 | 40 | 114 | 18 | 8 | 14 | 38 | 1 | 0 | 8.0 | NaN |
| 8 | 115400.0 | 45 | 111 | 15 | 13 | 1 | 36 | 1 | 0 | 14.0 | 5.0 |
| 9 | 100000.0 | 40 | 95 | 12 | 16 | 16 | 36 | 1 | 0 | 12.0 | 11.0 |
| wage | hours | IQ | educ | exper | tenure | age | married | black | meduc | feduc | |
|---|---|---|---|---|---|---|---|---|---|---|---|
| 925 | 64500.0 | 45 | 93 | 12 | 11 | 3 | 35 | 1 | 0 | 7.0 | 8.0 |
| 926 | 78800.0 | 40 | 100 | 11 | 15 | 6 | 32 | 1 | 1 | 9.0 | NaN |
| 927 | 64400.0 | 42 | 101 | 12 | 11 | 5 | 33 | 1 | 0 | 12.0 | NaN |
| 928 | 47700.0 | 45 | 100 | 12 | 9 | 3 | 31 | 1 | 0 | 7.0 | 7.0 |
| 929 | 66400.0 | 60 | 82 | 16 | 10 | 9 | 34 | 1 | 1 | 16.0 | 16.0 |
| 930 | 52000.0 | 40 | 79 | 16 | 6 | 1 | 30 | 1 | 1 | 11.0 | NaN |
| 931 | 120200.0 | 40 | 102 | 13 | 10 | 3 | 31 | 1 | 0 | 8.0 | 6.0 |
| 932 | 53800.0 | 45 | 77 | 12 | 12 | 10 | 28 | 1 | 1 | 7.0 | NaN |
| 933 | 87300.0 | 44 | 109 | 12 | 12 | 12 | 28 | 1 | 0 | NaN | 11.0 |
| 934 | 100000.0 | 40 | 107 | 12 | 17 | 18 | 35 | 1 | 0 | NaN | NaN |